Comparing emotion inferences from dogs (Canis familiaris), panins (Pan troglodytes/Pan paniscus), and humans (Homo sapiens) facial displays

Human beings are highly familiar over-learnt social targets, with similar physical facial morphology between perceiver and target. But does experience with or similarity to a social target determine whether we can accurately infer emotions from their facial displays? Here, we test this question across two studies by having human participants infer emotions from facial displays of: dogs, a highly experienced social target but with relatively dissimilar facial morphology; panins (chimpanzees/bonobos), inexperienced social targets, but close genetic relatives with a more similar facial morphology; and humans. We find that people are more accurate inferring emotions from facial displays of dogs compared to panins, though they are most accurate for human faces. However, we also find an effect of emotion, such that people vary in their ability to infer different emotional states from different species’ facial displays, with anger more accurately inferred than happiness across species, perhaps hinting at an evolutionary bias towards detecting threat. These results not only compare emotion inferences from human and animal faces but provide initial evidence that experience with a non-human animal affects inferring emotion from facial displays.

, and happy and neutral inferences from dog faces (LSD M diff = 0.07, SE = 0.03, p = .005, 95% CI [0.02, 0.12]) did not differ in accuracy, but participants were significantly better at inferring happy and neutral from dog faces from the two avoidant negative emotions. This suggests that participants struggled to infer negative avoidant emotions from facial displays in dog faces.
For panin faces, we found that participants were best at inferring neutral, then happy, then fearful, then sad from facial displays (all p's < .001).
When further considering the emotion × species interaction, there is a consistent pattern of highest accuracy for inferences of emotion from human, then dog, then panin faces across all facial displays (all p's < .001), except for neutral inferences from dog and panin faces which did not differ; LSD M diff = 0.04, SE diff = 0.02, p = .038, 95% CI [0.002, 0.09].
Correlations. We found a significant correlation between experience and confidence for inferences of emotion from dog facial displays, r (143) = 0.29, p < .001, such that more experience was related to increased confidence. We did not observe a significant correlation for panins, r (143) = 0.11, p = .181.
We also found a significant correlation between dog experience and reaction time, r (143) = 0.38, p < .001 such that participants more experienced with dogs took longer to infer emotions from their facial displays. We did not observe a significant correlation for panins, r (143) = 0.01, p = .905. This perhaps is an artefact of the circumstances participants encounter the animals, with dog-human interactions likely to occur in domestic contexts, while panin-human interactions likely to occur in captive contexts.
We also found a significant correlation between dog, r (131) = 0.19, p = .032 and panin experience and accuracy, r (143) = 0.27, p = .001, such that participants more experienced with both species were more accurate at inferring emotion from their facial displays. This is consistent with both theorising and previous literature.
Finally, we also found a significant correlation between confidence in inferring human facial displays and reaction time, r (145) = − 0.22, p = .006, such that people who were more confident were faster to categorise human facial displays. We did not find significant correlations for confidence and reaction time for any other species (all p's > .050). We also did not find significant correlations between confidence and accuracy for any species (all p's > .050). This suggest that a meta-cognitive evaluation was only relevant for human faces, the species participants felt most confident categorising.

Study 2
We followed Study 1 with a second study aimed at replicating the findings and broadening the emotion inferences. The most substantiative difference reduced the number of emotions to three, replacing fear and sadness with anger. Though anger is also a negatively valenced emotion, it is an approach emotion, and a conspecific with a facial display of anger and directed gaze is a threat, unlike one displaying sadness or fear. We kept most other aspects of the study the same (see the Methods Section for other minor differences).

Results
Accuracy scores. We found a significant main effect of species, Wald χ 2 (2) = 529.85, p < .001, such that Both main effects were qualified by a significant species × emotion interaction, Wald χ 2 (4) = 83.77, p < 0.001 (see Fig. 2). To unpack this interaction, we consider comparisons that were not significant within species (see Table 2 for all simple effect results). Accuracy rates for happy and angry inferences from human faces were not different (LSD M diff = 0.04 SE = 0.02, p = .051, 95% CI [0.00, 0.07]), but neutral inferences from human faces were significantly less accurate than both. This suggests that participants performed equally well when inferring emotions from human facial displays, but were less accurate when inferring neutral from facial displays, replicating the finding from Study 1.
Inferences of happy and neutral from dog faces did not significantly differ in accuracy (LSD M diff = 0.04, SE = 0.04, p = .285, 95% CI [− 0.03, 0.11]), but participants were significantly more accurate inferring angry than happy or neutral from dog faces. This suggests that participants did not struggle to infer negative approach emotions from facial displays in dogs. Coupled with the results of Study 1, it suggests that positive approach emotions are better inferred from dog faces, though there seems to be a boost for anger-a negative approach emotion-potentially as a threat signal.
For panin faces, we found that accuracy did not differ when participants inferred angry and neutral, LSD

Reaction times (RT).
There were no significant main effects or interactions for the reaction time measure, replicating the results of Study 1.  ). This suggests that participants were most confident inferring emotion from human facial displays followed by dog facial displays, and least confident inferring emotion from panin facial displays. These results replicate the findings in Study 1.

Correlations.
We found a significant correlation between dummy coded experience and confidence ratings for emotion inferences from panin faces, r (130) = 0.19, p = .027, as well as for dog faces, r (130) = 0.32, p < .001, such that more experience was related to increased confidence. We found a significant correlation between dummy coded experience and reaction time when inferring emotion from dog facial displays, r (130) = 0.20, p = .020 such that participants who were more experienced with dogs took longer to infer emotion from their facial displays. The time experience variable did not significantly correlate with any other measures for dogs or panins, and we did not find a significant correlation between dummy coded experience and accuracy, reaction time, or confidence for emotion inferences from panin facial displays (all p's > .050). www.nature.com/scientificreports/ We also found a significant correlation between confidence in inferring dog facial displays and reaction time, r (130) = 0.32, p < .001, such that participants who were more confident of their inference also took longer. No other correlation between experience and accuracy reached significance when inferring emotion from dog facial displays, and there was no significant correlation between confidence and accuracy or reaction time (all p's > .050) when inferring emotion from human facial displays.

Discussion
The primary aim of this study was to investigate whether experience or similarity best explain inferring emotions from facial displays of non-human animals. To test this, we pitted a domesticated and experienced species with very different facial morphology to humans (dogs) against species with a more similar facial morphology but much less likely to be encountered by our participants (panins). We found evidence that participants were more accurate inferring dog than panin facial displays. Correlational evidence suggests that higher experience with either animal is associated with increased accuracy inferring emotion from that animals' facial displays. This suggests that people are using their past experiences with the animal to infer emotions from facial displays.
We also explored emotion inference from human faces. In addition to better accuracy for human than animal faces, interestingly, we found that participants had more difficulty inferring neutral from human facial displays than the more emotional inferences; a difficulty they did not suffer with the animal inferences. This suggests a difference between how people infer emotions from human and animal faces, perhaps relying on more emotionspecific or mimicry mechanisms for human faces and learning mechanisms for animals. There may also be a predisposition towards inferring emotions from human facial displays that is not present for animals.
The pattern of results for the second study replicates the first as participants were more accurate inferring emotions from dog than panin facial displays. In addition, we found that there was no species advantage for inferring anger displays, consistent with an evolutionary preserved mechanism to detect threat. These results provide further support for the role of experience in inferring emotion from facial displays, though with the important caveat that when emotion inference is in service of threat perception, neither experience nor similarity modulate the effect.
Our findings replicate evidence in the literature demonstrating that humans can accurately categorise dog facial displays of emotion, perhaps because of experience with dogs, or changes in dogs' facial morphology to facilitate communication with humans 16,18 . We find that participants inferred emotions from dogs faces with varying degrees of accuracy. Angry inferences were most accurate, and participants performed significantly better than for happy inferences from dogs. However, happy inferences were significantly more accurate than sad and fear inferences; both latter inferences were also significantly more accurate than neutral inferences. This suggests a bias towards inferring the face as a threat (angry) or safety (happy) stimulus, over emotional inferences related to the negative well-being of the animal (fear, sad).
We find that emotion inferences for panins were also comfortably above-chance levels, along with dogs and humans, with only sad inferences in Study 1 just above chance. Also in Study 1, we find evidence consistent with the literature showing that humans more accurately infer happy than sad from macaque faces 24 . This suggests a bias towards accurate inferences from the face as a safety signal. However, Study 2 showed no difference between any of the emotions, urging caution when interpreting the results for panins.
Caution interpreting panin results relative to dogs is also needed when examining the meta-cognitive confidence ratings from participants. They felt most confident in their human inferences, followed by dogs, then panins. Moreover, confidence was correlated with experience for dogs across both studies, and panins only in Study 2, suggesting that more familiarity with the species boosted meta-cognitive self-perceptions of performance. Confidence was also correlated negatively with human inference reaction time in Study 1, and positively correlated with dog inference reaction time in Study 2, suggesting different impacts of confidence on performance across the two species.
Across both studies we found support for an effect of experience on emotion inference; we found significant correlations between experience and accuracy for dog faces in both studies, and panins in Study 1. Experience also correlated with reaction time for dogs across both studies, but not panins, suggesting that experience impacts performance primarily for the domesticated animal.
We found that angry facial displays were recognized regardless of species, providing evidence for an evolutionarily preserved threat detection mechanisms regardless of similarity or experience. Interestingly we found that angry displays were marginally more accurate than neutral displays for panins, and all emotions were inferred less accurately relative to neutral displays for panins. This suggests that perhaps participants used their mirror system to simulate panin facial displays, reducing accuracy for the faces in emotional situations since these displays use different facial muscles than humans.
There are several limitations with our studies, beyond the debate regarding whether animals experience emotions. It is commonly known that dogs use other parts of the body, such as wagging their tail, to express their emotions 11 , and humans can even recognise the emotions of dogs by just listening to their barks 26 , suggesting that dog faces may not be the most prominent cue for emotion recognition. Indeed, though humans also rely on human body posture 49 and prosody to convey and infer emotions 50 , dogs rely on body language to a greater extent than humans do 51 . Therefore, dog facial displays may not be the most prominent cue for emotion inference. We also did not design our study to test for participant gender differences, therefore we collected unequal samples of men and women in both studies.
In addition, facial displays in panins rely on different muscle configurations to communicate emotions than similar human emotional displays, despite the similar facial morphology 41 . Nonetheless, humans were better at recognising the emotions of dogs by merely looking at their faces. This clearly shows how experiences, like the amount of exposure to non-conspecific animals, override biological roots like homology of facial muscles when www.nature.com/scientificreports/ humans infer emotions in non-human animals. Additionally, we did not include all emotions in one study, so we cannot compare anger inferences with sad or fear. Finally, Study 1 is more complex than Study 2 because we include four emotion options rather than three. We also conflated two types of experience in our study-experience with a domesticated versus captive animal. Participants in our sample were more likely to encounter domesticated dogs, and captive panins, adding a confound to our results. Finally, our participants read stories about dogs and chimpanzees before completing the emotion inference task, and rated them on several characteristics, which could have influenced their performance on the emotion inference task. However, this exploratory variable did not interact with any of our reported main effects or interactions, suggesting that this exploratory task did not differentially impact our participants.
Future research could investigate the role of species domesticity in our understanding of animal facial displays, as well as other factors which affect whether anthropomorphic thinking manifests as increased or decreased accuracy of judgements regarding animal behaviours. Future research could also explicitly manipulate the experience or similarity of an animal, and test whether these variables enhance emotion recognition from facial displays.

Method
Participants. Study 1. We recruited participants using an opportunity sample via social media advertisements and a psychology subject pool. Those who were recruited from the subject pool were awarded course credits for participation. There were no differences based on the source of participants. In total, 147 United Kingdom participants completed the study, comprised of 113 female and 31 male participants, as well as 3 participants who identified as other, between the ages of 18 and 71, M age = 32.40 years, SD age = 13.60. Post-hoc power analyses revealed that we had sufficient power to detect a medium effect size, 1−β = 0.79. All participants were informed that they could leave the study at any time without consequence and gave their full informed consent before beginning the study. The study was approved by the University College London Ethics Committee; all experimental protocols were approved by this body, and all methods were carried out in accordance with relevant guidelines and regulations.

Study 2.
We collected data from 132 United States participants via Amazon Mechanical Turk; 36 females and 96 males (M age = 32.02 years, SD age = 8.14, age range: 22-61 years); 51 White participants, 52 Asian participants, 16 Hispanic participants, and 13 Black other ethnicities participants. Post-hoc power analyses revealed that we had sufficient power to detect a medium effect size, 1−β = 0.80. Participants were paid $1USD. We obtained ethical approval from the University College London Ethics Committee; all experimental protocols were approved by this body, and all methods were carried out in accordance with relevant guidelines and regulations.
Materials and procedure. Across two studies, participants viewed facial displays associated with emotional situations or contexts and inferred the correct facial display to the associated emotion derived in that context. We used 8 images of each facial display for each species-emotion combination, for a total of 96 images in Study 1, and 6 images of each combination in Study 2, for a total of 54 images. We used equal numbers of males and females for human faces.

Study 1.
We used Qualtrics to build the experiment. Participants viewed facial displays of three species (human, dog, panin), displaying emotions in four situations likely to evoke emotions (happy, sad, scared, neutral). There were 8 photographs for each category, so participants viewed 96 photos in total. The background of the photo was not visible; they were cropped so that only the face was visible to participants to avoid contextual information from the photo being used for emotion inference.
The photos of humans were used with permission from the Radboud database 52 . Only photos of people of phenotypically European descent were used (there was no race or ethnic diversity), and gender was balanced for each emotional category. The photographs of dogs were obtained with permission from a professional pet photographer (www. thedo gphot ograp her. co. uk, 2018), and from Shutterstock Images (https:// www. shutt ersto ck. com, 2018), and showed a variety of breeds. Dog breeds in which head shape has been heavily selected, such as pugs, were avoided; instead, the photos were primarily of gundogs and mongrels. The number of dogs with erect and non-erect ears within each emotional category were balanced to prevent ear shape affecting the perception of the participants. Previous research depicted dogs in situations proven to evoke specific emotional responses, such as fear provoked by the presence of nail clippers 40 ; therefore, the photos used in this study depicted a range of situations known to evoke specific emotions in most dogs to support the probable presence of the emotion (see Table 3).
This situation-emotion matching approach was also used when selecting photographs of panins (see Table 1). Expert evolutionary anthropologists have developed a facial coding system for chimpanzees (ChimpFACS 53 ;) and they can discriminate between facial displays of chimpanzees 54 . One such experienced researcher provided photographs of bonobos and chimpanzees, and kindly categorized them into facial displays based on extensive experience of panin behaviour. These facial displays were then matched to likely emotions based on background information from the photo or photo source. No distinction was drawn between bonobos and chimpanzees (Pan paniscus and Pan troglodytes), and photos were not gender balanced for panins or dogs, as their facial sexual dimorphism, whilst present 55,56 , is not clearly visible to most humans.
Participants first read an anthropomorphic or mechanistic short story regarding a dog and a chimpanzee. Then, on seven-point Likert scale from 'not at all' to 'very' , they rated chimpanzees on being organised, neat, careful, in control of its actions, and sensitive in the anthropomorphism condition, and strong, easily hurt, large, dark coloured, and disciplined in the mechanistic condition, while they rated dogs on disciplined, lazy, thoughtful, easily hurt, and loyal in the anthropomorphism condition, and big, in control of its actions, furry, carnivorous, and www.nature.com/scientificreports/ sensitive in the mechanistic condition. Those data were collected for exploratory reasons and will not be discussed further. Nonetheless, we tested whether this exploratory variable affected responses on the main dependent variable, and found no significant main effects of interactions (all p's > .108 across both studies). Participants clicked the link to the study to be directed to the website to begin. After reading an information sheet and filling in a consent form, they read the anthropomorphic or mechanistic short story followed by the rating questions.
They then saw the photos of the faces in a random order and indicated via mouse clicks which of the four emotions (happy, sad, fearful, neutral) best described the facial display in a forced choice format. There was no time limit. They then indicated their experience with dogs and chimpanzees by describing this experience (e.g. owned a dog for 4 years). We dummy coded these responses for analyses. Next, they rated their overall confidence in categorizing the faces from the three species on a Likert scale from 0 (no confidence) to 100 (very confident). They then completed an individual difference measure of flexible social cognition 57 for exploratory purposes; we do not report the results of this measure. Finally, they provided demographic information, were debriefed, and thanked.

Study 2.
We followed up the initial study with a second study that aimed to replicate these findings in a different sample. We used the same materials and followed the same procedure in Study 2 as Study 1 with a few exceptions. First, we reduced the number of emotions to three by eliminating sad and fearful and replacing them with anger: a negatively valenced approach emotion that serves as a threat signal when inferred in another. This allows us to determine whether all negatively valenced emotions are inferred the same way. Moreover, directed gazes displaying anger suggest that the entity with the face is a threat to the perceiver and therefore should trigger evolutionarily preserved threat-detection mechanisms.
Second, participants pressed a key to indicate their emotion choice rather than clicking the corresponding label on the screen. This change in response style facilitated more efficient and perhaps rapid responses. It also allows us to determine whether the null effects for reaction time are due to the response style or replicate across response style.
Third, we built the experiment with Gorilla experiment builder rather than Qualtrics, changing software platforms to rule out that the effect only occurred on a single online survey building platform.
Fourth, we asked about experience with the non-human animals by first having participants indicate the type of experience they had (dogs; dog owner, looked after dog, petted other people's dog, other; chimpanzees; seen them in a zoo, watched them on TV, other) before indicating the amount of time they had this experience. We dummy coded types of experience as in Study 1 (dummy coded experience) and treated it as a separate variable from the amount of time (time experience). This change allowed us to get two experience dependent variables.
Fifth, we reduced the number of images per stimulus type from 8 to 6, resulting in a total of 54 images to shorten the length of the experiment. Finally, we asked participants to indicate which part of the face they paid most attention to when completing the task for each of the three species for exploratory reasons. All other procedures and materials remained the same. We kept all other elements of the design the same except where just noted.
Data analysis strategy. The data analysis strategy remained the same across both studies. We used SPSS (V27) to analyse the data. The data were checked for normality using Box's Test of Equality of Covariances of Means, and the reaction time data were transformed by Log10, therefore no outliers were removed from the reaction time or accuracy data. We converted accuracy to a proportion, such that 1.00 reflected perfect accuracy. We aggregated the data by averaging over the 8 trials in each of the 12 (Study 1) or 9 (Study 2) types of stimuli. To test the hypothesis that species affect categorisation accuracy, we completed Wald chi-square tests. This analysis deviated from our pre-registration of Study 2 where we stated we would perform parametric analyses. We computed a repeated measures ANOVA to investigate the effect of facial display of emotion and species on reaction time, and a second repeated measures ANOVA to determine whether participants expressed different levels of confidence in their ability to categorise the species. We corrected for multiple comparisons on the above inferential statistics with Bonferroni correction where appropriate. We used both Bonferroni corrected significance level and whether the confidence interval range included zero as criteria to determine statistical significance in our simple effect tests. Finally, we correlated experience, confidence, reaction time, and accuracy within species.